16 March, 2016

GIS

Geographic Information Systems

GIS

Geographic Information Systems

  • incorporating
  • storing
  • manipulating
  • analyzing
  • displaying…

Spatial Data

What is spatial data?

Nonspatial data has no location information

nonspatial = data.frame(
  id=c(1,2,3,4),
  data=rnorm(4)
)
print(nonspatial)
##   id        data
## 1  1 -0.03994901
## 2  2  0.37952812
## 3  3 -1.03246442
## 4  4 -0.25787322

What is spatial data?

Spatial data has location information

The simplest spatial data are points on a map

spatial = data.frame(
  id=c(1,2,3,4),
  data=rnorm(4),
  x=runif(4,-180,180),
  y=runif(4,-90,90)
)
print(spatial)
##   id       data          x         y
## 1  1  0.7772441  158.92329  51.94688
## 2  2  0.7718913  -61.10863  11.97929
## 3  3 -0.7235475   31.81460 -10.14776
## 4  4  1.9104498 -174.60919 -40.88082

What is spatial data?

Which we can convert to explicitly spatial data using the sp package. Most GIS packages in R store data as sp classes.

library(sp)

What is spatial data?

The sp package has a method called coordinates that converts points to an sp class.

coordinates(spatial) = ~ x + y
class(spatial)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
plot(spatial, axes=T)


What is spatial data?

Spatial data also needs a reference system or "projection" that allows us to represent spatial features on a map. Projections can be thought of as simply a coordinate system with an origin that is relative to a known point in space.

This is a whole field of mathematically intensive study termed "geodesy"

Much of the field of geodesy is jam-packed in the rgdal package, which is a wrapper for the Geospatial Data Abstration Library

library(rgdal)

What is spatial data?

rgdal includes a comprehensive list of projections that are typically represented as a string of parameters.

The most common is our standard latitude/longitude system, where the coordinates are angular and the origin is the equator directly south of Greenwich, England. The simplest projection string to denote this projections is:

"+proj=longlat"

To define the projection for spatial, we write to its proj4string slot:

proj4string(spatial) = "+proj=longlat"

Projections are a necessary evil for GIS users (to be continued)

What is spatial data?

With a projection associated with our spatial data, we can now relate it to other spatial data. In other words, let's make a map!

library(leaflet)
m = leaflet(data=spatial) %>%
  addTiles() %>%
  addMarkers()
m

What is spatial data?

Spatial data types

Spatial data types

Two main types: vector and raster

Vector = Polygons
Raster = Grid

Vector = Discrete
Raster = Continuous

Vector = Illustrator/Inkscape
Raster = Photoshop/GIMP







Vector Data

Intro

Vector data is commonly used for phenomona that have discrete boundaries, e.g., fire hydrants, streets, and buildings. Vector data can be quickly summarized as shapes, specifically points, lines and polygons. Each of these are described by coordinate pairs (an X and Y) that describe either the point, trace the path of the line or enclose a polygon.







Table here of pts, lines, polys and example graphic examples of each grid

Vector Data Terminology

Vector data will commonly have more than one shape, such as 72 counties in Wisconsin or a stream dataset. Each county is a feature (might hear the collection referred to as a featureclass). Dataset attached to the spatial data is called the attribute table (not R terminology but generally GIS users will use this term). Each feature will have one row of the attribute data.

Vector I/O

library(rgdal)

Vector I/O

soils = readOGR(dsn="data", layer="soilsData")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "soilsData"
## with 75 features
## It has 27 fields
writeOGR(
  soils,
  "data",
  "soilsData_out",
  driver="ESRI Shapefile"
)

Vector data structure

Some helper functions

class(soils)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
slotNames(soils)
## [1] "data"        "polygons"    "plotOrder"   "bbox"        "proj4string"
length(soils)
## [1] 75

Vector data structure

Take a peak here at the top of our attribute table

str(soils@data[,1:10])
## 'data.frame':    75 obs. of  10 variables:
##  $ mukey  : Factor w/ 25 levels "2774742","2774772",..: 12 19 12 6 9 5 7 24 25 13 ...
##  $ muarcrs: Factor w/ 75 levels "0.40538405","0.90105194",..: 30 62 70 18 8 26 19 5 15 23 ...
##  $ Sand1  : num  21.5 11 21.5 12 29.5 ...
##  $ Sand2  : num  35.29 8.34 35.29 9.02 38.22 ...
##  $ Sand3  : num  45.09 7.56 45.09 16.59 64.52 ...
##  $ Sand4  : num  48.6 30.6 48.6 36.7 33.2 ...
##  $ Sand5  : num  0 31.1 0 27.1 57.2 ...
##  $ Silt1  : num  45.8 65.2 45.8 68.7 54.5 ...
##  $ Silt2  : num  37.3 64.9 37.3 60.3 43.5 ...
##  $ Silt3  : num  36.7 64.1 36.7 34 21.1 ...

Vector data structure

str(soils@polygons[1])
## List of 1
##  $ :Formal class 'Polygons' [package "sp"] with 5 slots
##   .. ..@ Polygons :List of 1
##   .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
##   .. .. .. .. ..@ labpt  : num [1:2] 514199 291168
##   .. .. .. .. ..@ area   : num 10776
##   .. .. .. .. ..@ hole   : logi FALSE
##   .. .. .. .. ..@ ringDir: int 1
##   .. .. .. .. ..@ coords : num [1:21, 1:2] 514211 514206 514195 514178 514180 ...
##   .. ..@ plotOrder: int 1
##   .. ..@ labpt    : num [1:2] 514199 291168
##   .. ..@ ID       : chr "0"
##   .. ..@ area     : num 10776

*You probably don't want to call head() or str() on a large spatial object, as this spits out the first six features and all their attributes

Vector data structure

Index the first feature with a slice (note this will also grab its data)

poly_1 = soils[1,]

Use the subset() function just as you would on a normal dataframe

silty = subset(soils, Silt1 > 70)

Plotting

plot(
  soils,
  main="Soils from Western WI",
  col=rainbow(5)
)












Working with Vector Data

Overlay Analysis

Scenario: we are tasked with finding wells susceptible to contamination, that is wells location in areas of sandy soils.

# Pseudo-code
1) Read in point data
1.5) Covert to spatial data
2) Read in soils
3) Perform relational analysis to find soil properties at well locations 
4) Select those wells with high sand

Vector Overlay

wells = read.delim("./data/WellLocations.tsv")
class(wells); head(wells)
## [1] "data.frame"
##           x        y pts.data.id
## 1 -90.05145 43.10047           1
## 2 -90.05553 43.10470           2
## 3 -90.07305 43.09013           3
## 4 -90.04716 43.08454           4
## 5 -90.07198 43.08850           5
## 6 -90.06599 43.09197           6
coordinates(wells) <- ~ x + y
class(wells)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

Vector Overlay

Plot out to see where our points lie

soils = readOGR(
    dsn="data",
    layer="soilsData")
plot(
  soils,
  main="Soils from Western WI",
  col=rainbow(5)
)
plot(
    wells,
    add=T
)












Hmmmmm, where are the well points?

Coordinate Reference Systems in Brief

Coordinate reference systems are ways of referencing X and Y (longitude and latitdue) to specific points on the Earth. When they don't match, identical points won't lay on top of one another.

print(wells@proj4string)
## CRS arguments: NA
print(soils@proj4string)
## CRS arguments:
##  +proj=tmerc +lat_0=0 +lon_0=-90 +k=0.9996 +x_0=520000
## +y_0=-4480000 +datum=NAD83 +units=m +no_defs +ellps=GRS80
## +towgs84=0,0,0
print(coordinates(soils)[1:2]);print(coordinates(wells)[1:2]);
## [1] 514198.6 515864.4
## [1] -90.05145 -90.05553

Coordinate Reference Systems (cont)

Solution: define the CRS then project the points to CRS of the soils data

wells@proj4string = CRS(
    "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")
wells = spTransform(
    wells,
    soils@proj4string)
plot(
  soils,
  col=rainbow(5)
)
plot(
    wells,
    add=T
)












Vector Overlay

Spatial Relationships

We'll select those soils polygons that are within a distance of each well. Then examine them. Plot with size of point related to amount of average sand content.

Vector Overlay

Spatial Relationships

Or we can use a higher level function to extract the soils data to the points.

Vector Analysis

Further Plotting

Change dataset to voting wards for Wisconsin 24th Senate District, mix of census demography data and election results.

wards = readOGR(
    dsn="data",
    layer="WardData"
)
names(wards@data)
##  [1] "NAME_x"    "ASM"       "SEN"       "CON"       "MCD_NAM"  
##  [6] "PERSONS_"  "WHITE"     "BLACK"     "HISPANIC"  "ASIAN"    
## [11] "AMINDIAN"  "PISLAND"   "OTHER"     "OTHERMLT"  "PERSONS1" 
## [16] "WHITE18"   "BLACK18"   "HISPANIC1" "ASIAN18"   "AMINDIAN1"
## [21] "PISLAND1"  "OTHER18"   "OTHERMLT1" "CNTY_NA"   "PRES_TO"  
## [26] "PRES_RE"   "PRES_DE"   "PRES_CO"   "PRESIND1"  "PRESIND2" 
## [31] "PRESIND3"  "PRESIND4"  "PRESIND5"  "PRESIND6"  "PRESSCA"  
## [36] "SEN_TOT"   "SEN_REP"   "SEN_DEM"   "SEN_IND1"  "SEN_IND2" 
## [41] "SEN_IND3"  "SEN_CON"   "SEN_SCA"   "CON_TOT"   "CON_REP"  
## [46] "CON_DEM"   "CON_IND"   "CON_SCA"   "SS_TOT_"   "SS_REP_"  
## [51] "SS_DEM_"   "SS_IND_"   "SS_SCAT"   "ASM_TOT"   "ASM_REP_" 
## [56] "ASM_DEM_"  "ASM_IND1"  "ASM_SCA"   "ASM_DEM2"  "ASM_IND2" 
## [61] "ASM_REP2"  "DA_TOT_"   "DA_REP_"   "DA_DEM_"   "DA_IND_"  
## [66] "DA_SCAT"   "DA_DEM2"

Make a Map

Show percent of vote that was democrat with an indicator of voter turnout.

options(stringsAsFactors=F)

library(rgdal)
library(rgeos)
library(foreign)
library(classInt)
library(RColorBrewer)
library(scales)
source("./misc_scripts/function_proper_legend.r")
wards@data$PERC_DEM = with(wards@data, SEN_DEM/SEN_TOT)
wards@data$PERC_TURN = with(wards@data, SEN_TOT/PERSONS1)

wards_centroids = gCentroid(wards, byid=T)
wards_centroids = SpatialPointsDataFrame(
    gCentroid(wards, byid=T), 
    over(wards_centroids, wards)
)

Make a Map (cont)

## defining number of classes
num_classes = 6
## the color palette
pal = brewer.pal(num_classes, "RdBu")
## the class intervals to use for the colors
class_ints = classIntervals(wards@data$PERC_DEM, num_classes)
## grab the colors for plotting
colrs = findColours(class_ints, pal)

legtxt = properLegend(colrs, 2)

plot(wards,
     col=colrs,
     main="Senate 24",
     border=NA)

Make a Map (cont)

plot(wards_centroids,
     pch=20,
     cex=(wards_centroids@data$PERC_TURN),
     col=alpha('black', 0.5),
     add=T)

legend("topleft",
    legtxt,
    title="Proportion Democrat",
    fill=pal,
    bty='n'
)
legend("topright",
       c("20%", '50%', '80%'),
       pch=20,
       pt.cex=c(0.2, 0.5, 0.8),
       col=alpha('black', 0.5),
       bty='n',
       title="Percent\nTurnout"
)

The map

Vectors

for modeling…

shape = readOGR(dsn, layer) neigborhood_binary = poly2nb(shape) list_of_weights = sb2listw(neighborhood_binary) moran.test(shape$Column, list_of_weights, alternative="two.sided") spat_lin_reg = spautolm(depVar ~ indVar, data=shape, family="SAR", listw=list_of_weights)

Gotchas

  • *SpatialObjects vs SpatialObjectsDataFrames

Raster Data

Intro

A raster grid is rectangular.

Grid is another word for matrix.

Grid is another word for image.

A GIS raster grid is a matrix/image with an associated location and projection.

Intro

At a minimum, a GIS raster grid contains:

  1. matrix of values
  2. projection
  3. reference point, often (x,y) of the lower-left corner
  4. cellsize









Raster I/O

The rgdal rgdal packages is primarily for I/O and projecting GIS data

library(rgdal)

The raster package does everything rgdal does, but it includes lots of additional functionality.

library(raster)

Raster I/O

elev = readGDAL("data/dem_wi.tif")
writeGDAL(elev, "data/dem_wi_out.tif")
elev = raster("data/dem_wi.tif")
writeRaster(elev, "data/dem_wi_out.tif")

Raster data structure

The raster object elev has all the necessary pieces of spatial information:

elev
## class       : RasterLayer 
## dimensions  : 284, 387, 109908  (nrow, ncol, ncell)
## resolution  : 0.01666667, 0.01666667  (x, y)
## extent      : -93.03262, -86.58262, 42.3949, 47.12823  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/devans/Documents/GeRgraphyPresentation/data/dem_wi.tif 
## names       : dem_wi 
## values      : 175, 565.4104  (min, max)

Raster data structure

Which means we can make a map!

m = leaflet() %>%
  addTiles() %>%
  addRasterImage(elev, opacity=0.5)
m

Raster data structure

Raster analysis

Remember that rasters are just matrices!

Therefore, most matrix operations can be applied to rasters. For example:

plot(
  elev > 400,
  col=c("red", "blue")
)








Raster analysis

Rasters can be easily converted to matrices to do more complex work.

lat_grad = apply(
  as.matrix(elev),
  1,
  mean,
  na.rm=T
)
plot(lat_grad, type="l")






Raster overlay

Most raster analysis ultimately executes some sort of overlay.

The issue:

To overlay two or more rasters, their projections, extents, and cellsizes must align perfectly.

This can be a difficult task.

Raster overlay

coordinate systems

What is the highest point in each county?

# Pseudo-code
1. Read in elevation data (raster grid)
2. Read in county boundary data (polygons)
3. Convert counties to raster grid that aligns with elevation grid
4. Find maximum elevation gridcell within each county

Raster overlay

coordinate systems

counties = readOGR("data", "WI_Counties")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "WI_Counties"
## with 72 features
## It has 7 fields
elev
## class       : RasterLayer 
## dimensions  : 284, 387, 109908  (nrow, ncol, ncell)
## resolution  : 0.01666667, 0.01666667  (x, y)
## extent      : -93.03262, -86.58262, 42.3949, 47.12823  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
## data source : /home/devans/Documents/GeRgraphyPresentation/data/dem_wi.tif 
## names       : dem_wi 
## values      : 175, 565.4104  (min, max)

Raster overlay

coordinate systems

proj4string(counties)
## [1] "+proj=tmerc +lat_0=0 +lon_0=-90 +k=0.9996 +x_0=520000 +y_0=-4480000 +ellps=GRS80 +units=m +no_defs"
proj4string(elev)
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Raster overlay

coordinate systems

extent(counties)
## class       : Extent 
## xmin        : 294839 
## xmax        : 770036.4 
## ymin        : 225108.8 
## ymax        : 734398.4
extent(elev)
## class       : Extent 
## xmin        : -93.03262 
## xmax        : -86.58262 
## ymin        : 42.3949 
## ymax        : 47.12823

Raster overlay

coordinate systems

cty_grid = rasterize(counties, elev, field="COUNTY_FIP")
summary(cty_grid)
##          layer
## Min.        NA
## 1st Qu.     NA
## Median      NA
## 3rd Qu.     NA
## Max.        NA
## NA's    109908

Raster overlay

coordinate systems

prj = proj4string(elev)
cty_prj = spTransform(counties, prj)
To do this, we use the spTransform function in the sp package.

Raster overlay

coordinate systems

extent(cty_prj)
## class       : Extent 
## xmin        : -92.88924 
## xmax        : -86.8048 
## ymin        : 42.49197 
## ymax        : 47.08077
extent(elev)
## class       : Extent 
## xmin        : -93.03262 
## xmax        : -86.58262 
## ymin        : 42.3949 
## ymax        : 47.12823

Raster overlay

coordinate systems

plot(elev)
plot(cty_prj, add=TRUE)














Raster overlay

coordinate systems

cty_grid = rasterize(counties, elev, field="COUNTY_FIP")
summary(cty_grid)
##          layer
## Min.        NA
## 1st Qu.     NA
## Median      NA
## 3rd Qu.     NA
## Max.        NA
## NA's    109908

Raster overlay

coordinate systems

extent(cty_grid)
## class       : Extent 
## xmin        : -93.03262 
## xmax        : -86.58262 
## ymin        : 42.3949 
## ymax        : 47.12823
extent(elev)
## class       : Extent 
## xmin        : -93.03262 
## xmax        : -86.58262 
## ymin        : 42.3949 
## ymax        : 47.12823

Raster overlay

coordinate systems

library(dplyr)
ovly = data.frame(
  elev=getValues(elev),
  cty=getValues(cty_grid)
)

hi_pt = ovly %>%
  group_by(cty) %>%
  mutate(
    elev = (elev == max(elev, na.rm=T)) * elev
  ) %>%
  ungroup()

elev = setValues(elev, hi_pt[["elev"]])
elev[elev == 0] = NA
hi_pt_sp = rasterToPoints(elev, spatial=T)

Raster overlay

coordinate systems